32 research outputs found

    2.5K-Graphs: from Sampling to Generation

    Get PDF
    Understanding network structure and having access to realistic graphs plays a central role in computer and social networks research. In this paper, we propose a complete, and practical methodology for generating graphs that resemble a real graph of interest. The metrics of the original topology we target to match are the joint degree distribution (JDD) and the degree-dependent average clustering coefficient (cˉ(k)\bar{c}(k)). We start by developing efficient estimators for these two metrics based on a node sample collected via either independence sampling or random walks. Then, we process the output of the estimators to ensure that the target properties are realizable. Finally, we propose an efficient algorithm for generating topologies that have the exact target JDD and a cˉ(k)\bar{c}(k) close to the target. Extensive simulations using real-life graphs show that the graphs generated by our methodology are similar to the original graph with respect to, not only the two target metrics, but also a wide range of other topological metrics; furthermore, our generator is order of magnitudes faster than state-of-the-art techniques

    A Network Coding Approach to Loss Tomography

    Get PDF
    Network tomography aims at inferring internal network characteristics based on measurements at the edge of the network. In loss tomography, in particular, the characteristic of interest is the loss rate of individual links and multicast and/or unicast end-to-end probes are typically used. Independently, recent advances in network coding have shown that there are advantages from allowing intermediate nodes to process and combine, in addition to just forward, packets. In this paper, we study the problem of loss tomography in networks with network coding capabilities. We design a framework for estimating link loss rates, which leverages network coding capabilities, and we show that it improves several aspects of tomography including the identifiability of links, the trade-off between estimation accuracy and bandwidth efficiency, and the complexity of probe path selection. We discuss the cases of inferring link loss rates in a tree topology and in a general topology. In the latter case, the benefits of our approach are even more pronounced compared to standard techniques, but we also face novel challenges, such as dealing with cycles and multiple paths between sources and receivers. Overall, this work makes the connection between active network tomography and network coding

    Walking in Facebook: A Case Study of Unbiased Sampling of OSNs

    Full text link
    Abstract—With more than 250 million active users [1], Face-book (FB) is currently one of the most important online social networks. Our goal in this paper is to obtain a representative (unbiased) sample of Facebook users by crawling its social graph. In this quest, we consider and implement several candidate techniques. Two approaches that are found to perform well are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the ”ground-truth ” (UNI- obtained through true uniform sampling of FB userIDs). In contrast, the traditional Breadth-First-Search and Random Walk (without re-weighting) perform quite poorly, producing substantially biased results. In addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these can be used to effectively determine when a random walk sample is of adequate size and quality for subsequent use (i.e., when it is safe to cease sampling). Using these methods, we collect the first to the best of our knowledge unbiased sample of Facebook. Finally, we use one of our representative datasets, collected through MHRW, to characterize several key properties of Facebook. Index Terms—Measurements, online social networks, Facebook, graph sampling, crawling, bias. I

    Practical recommendations on crawling online social networks

    No full text
    Our goal in this paper is to develop a practical framework for obtaining a uniform sample of users in an online social network (OSN) by crawling its social graph. Such a sample allows to estimate any user property and some topological properties as well. To this end, first, we consider and compare several candidate crawling techniques. Two approaches that can produce approximately uniform samples are the Metropolis-Hasting random walk (MHRW) and a re-weighted random walk (RWRW). Both have pros and cons, which we demonstrate through a comparison to each other as well as to the “ground truth. ” In contrast, using Breadth-First-Search (BFS) or an unadjusted Random Walk (RW) leads to substantially biased results. Second, and in addition to offline performance assessment, we introduce online formal convergence diagnostics to assess sample quality during the data collection process. We show how these diagnostics can be used to effectively determine when a random walk sample is of adequate size and quality. Third, as a case study, we apply the above methods to Facebook and we collect the first, to the best of our knowledge, representative sample of Facebook users. We make it publicly available and employ it to characterize several key properties of Facebook

    A Network Coding Approach to Network Tomography

    No full text
    Network tomography aims at inferring internal network characteristics based on measurements at the edge of the network. In loss tomography, in particular, the characteristic of interest is the loss rate of individual links. There is a significant body of work dedicated to this problem using multicast and/or unicast end-to-end probes. Independently, recent advances in network coding have shown that there are several advantages from allowing intermediate nodes to process and combine, in addition to just forward, packets. In this paper, we re-visit the problem of loss tomography in networks that have network coding capabilities. We design a novel framework for estimating link loss rates, which leverages network coding capabilities to improve several aspects of the tomography problem, including the identifiability of links, the tradeoff between accuracy of estimation and bandwidth efficiency, and the complexity of probe path selection. We present first the case of tree topologies and then the case of general graphs. In the latter case, the benefits of our approach are even more pronounced compared to standard techniques but we also face novel challenges, such as dealing with cycles and multiple paths between sources and receivers

    A network coding approach to IP traceback

    No full text
    Abstract—Traceback schemes aim at identifying the source(s) of a sequence of packets and the nodes these packets traversed. This is useful for tracing the sources of high volume traffic, e.g., in Distributed Denial-of-Service (DDoS) attacks. In this paper, we are particularly interested in Probabilistic Packet Marking (PPM) schemes, where intermediate nodes probabilistically mark packets with information about their identity and the receiver uses information from several packets to reconstruct the paths they have traversed. Our work is inspired by two observations. First, PPM is essentially a coupon collector’s problem [1], [2]. Second, the coupon collector’s problem significantly benefits from network coding ideas [3], [4]. Based on these observations, we propose a network coding-based approach (PPM+NC) that marks packets with random linear combinations of router IDs, instead of individual router IDs. We demonstrate its benefits through analysis. We then propose a practical PPM+NC scheme based on the main PPM+NC idea, but also taking into account the limited bit budget in the IP header available for marking and other practical constraints. Simulation results show that our scheme significantly reduces the number of packets needed to reconstruct the attack graph, in both single- and multi-path scenarios, thus increasing the speed of tracing the attack back to its source(s). I
    corecore